A particle-based policy for the optimal control of Markov decision processes ⋆
نویسندگان
چکیده
When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse of dimensionality). Policy search techniques are able to overcome this problem because, instead of estimating the value function over the entire state space, they search for the optimal control policy in a restricted parameterized policy space. This paper presents a new policy parametrization that exploits a single point (particle) to represent an entire region of the state space and can be tuned through a recently introduced policy gradient method with parameter-based exploration. Experiments demonstrate the superior performance of the proposed approach in high dimensional environments.
منابع مشابه
Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملA new machine replacement policy based on number of defective items and Markov chains
A novel optimal single machine replacement policy using a single as well as a two-stage decision making process is proposed based on the quality of items produced. In a stage of this policy, if the number of defective items in a sample of produced items is more than an upper threshold, the machine is replaced. However, the machine is not replaced if the number of defective items is less than ...
متن کاملA POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملAvailability analysis of mechanical systems with condition-based maintenance using semi-Markov and evaluation of optimal condition monitoring interval
Maintenance helps to extend equipment life by improving its condition and avoiding catastrophic failures. Appropriate model or mechanism is, thus, needed to quantify system availability vis-a-vis a given maintenance strategy, which will assist in decision-making for optimal utilization of maintenance resources. This paper deals with semi-Markov process (SMP) modeling for steady state availabili...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کامل